http://zinc.docking.org/ SEARCH: One can compose a query by specifying molecular property (Net charge, xLogP, Rotatable bonds, H-donors, Polar surface area, Molecular weight, etc.) or molecule constitution (SMILES/SMARTS). One may also specify ZINC IDs, and original catalog numbers. INFORMATION:
Supplier information; Representations (links to other databases)
Properties:xLogP, ap & p desolvation, HBD,HBA,Charge,Mwt,NRB (comment: no units, no solvent data, no descriptions)
DUD, a directory of useful decoys for benchmarking virtual screening. DUD is designed to help test docking algorithms by providing challenging decoys. It contains:
A total of 2,950 active compounds against a total of 40 targets
For each active, 36 "decoys" with similar physical properties (e.g. molecular weight, calculated LogP) but dissimilar topology.
Every ligand has 36 decoy molecules that are physically similar but topologically distinct, leading to a database of 98266 compounds. For most targets, enrichment was at least half a log better with uncorrected databases such as the MDDR than with DUD, evidence of bias in the former. These calculations also allowed 40 × 40 cross-docking, where the enrichments of each ligand set could be compared for all 40 targets, enabling a specificity metric for the docking screens.
The PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies. As a member of the wwPDB, the RCSB PDB curates and annotates PDB data according to agreed upon standards. The RCSB PDB also provides a variety of tools and resources. Users can perform simple and advanced searches based on annotations relating to sequence, structure and function. These molecules are visualized, downloaded, and analyzed by users who range from students to specialized scientists.
The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs. SEARCH: 1) simple, 2) advanced INFORMATION:
Name, Synonyms
Drug Type
Brand Names, Brand Mixtures
Chemical IUPAC Name, Chemical Formula, Chemical Structure
RxList Link,PDRhealth Link, Wikipedia Link
Melting Point, Water Solubility (Experimental, Predicted), LogP/Hydrophobicity (Experimental, Predicted), LogS (Experimental, Predicted), Caco2 Permeability, pKa/Isoelectric Point
Structures: 2D, 3D, MOL,SDF, PDB, SMILES
Pharmacology, Mechanism of Action, Absorption, Toxicity, Protein Binding, Biotransformation, Half Life, Pathways, Patient Information
ChemSpider links together compound information across the web, providing free text and structure search access of millions of chemical structures. With an abundance of additional property information, tools to upload, curate and use the data, and integration to a multitude of other online services, ChemSpider is the richest single source of structure-based chemistry information. It is provided to the community by the Royal Society of Chemistry.
SEARCH: 1) simple 2) advanced INFORMATION:
Empirical Formula, Molecular Weight, Mass
Wikipedia Article(s)
Associated data sources (links), Patents, Articles
Properties: Predicted (logP, LogD, volume, SASA, melting point, solubility, etc.) and Experimental
PubMed comprises more than 19 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
RISM-MOL
*1. To add a key to key files for selection. Example:
P = getDataSets(DataSetsPath,'Frolov_TestSet','c');
*2. The script:
Usage: ./script MolIndex
- takes as command line input index of molecule (in the order they are present *.m file) to start the calculations from
- gets molecule names from /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m file
- runs a loop over molecules in the list starting with those with specified index.
- runs the MATLAB, that executes all the commands in RunSet_TestSet.m file (with mol index as input).
- waits until the MATLAB process disappears from the list of running processors or the time of the MATLAB execution is more that 5 min. If process is still in the list after 5 min => it is killed and warning is written.
#!/bin/bash
echo "Usage script StartInd"
StartInd=$1
list=`cat /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m`
ind=0
for val in $list ; do
ind=$(($ind+1))
if [ $ind -ge $(($StartInd+1)) ]; then
mol=${val:1:$(( ${#val}-6 ))}
MolInd=$(($ind-1))
echo "Running mol: " $MolInd " " $mol
if [ $p -gt 0 ] ; then
echo "WARNING!!! KILLING pid: "$pid
kill $pid
fi
fi
done
The contents of the "/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m":
TestSet={...
'1,1,1,2-tetrachloroethane',...
'1,1,1-trichloroethane',...
'1,1,2,2-tetrachloroethane',...
Note: have to put the 'tmp' name at the end to keep the ",..." for the previous molecule. After calculating all molecules the SCRIPT will try to calculate the "t" ("tmp") molecule and will fail. This is OK.
The contents of "RunSet_TestSet.m":
- sets the path to database functions
- runs the script with cell array of molecule names
- sets the DataBasePath
- runs a loop for one entry: gets the RISM input, runs RISM input with "do_something" and "StartRISMscript".
function RunSet_any(ind)
path(path,'/net/v215-2/data4/fedorov-group/Database/bin/');
run '/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m';
Set=TestSet;
%run /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script.m
%Set=TestSet;
N=length(Set);
DataSetsPath = '/net/v215-2/data4/fedorov-group/Database/DataSets';
P = getDataSets(DataSetsPath,'Frolov_TestSet','System');
ou = struct('Distance','Angstr','Energy','kcal_mol');
for i=ind:ind
i
name=TestSet{i};
[FL,KL,KFL,PL]=multiSelect(P,'Name',name);
FL;
do_something(StartRISMScript(DataSetsPath,ou,'user_Closure=''HNC''; user_MixingRules=''LorentzBerthelot''; user_LambdaCoupling=0.5;'),FL,KL,KFL,PL);
Interactively deconstruct target compounds into component precursors and reconstruct similar building-blocks into combinatorial libraries representing the "virtual chemical space" near the target compound.
Interactive system for learning and practicing reactions, syntheses and mechanisms in organic chemistry, with advanced support for the automatic generation of random problems, curved-arrow mechanism diagrams, and inquiry-based learning.
Datasets: For Machine Learning and Searching Experiments
Various available chemical datasets annotated with interesting properties to train and test machine-learning prediction and searching methods.
http://ligand.info/
Ligand.Info is a compilation of various publicly available databases of small molecules such as ChemBank, ChemPDB, KEGG, NCI, AKos GmbH, Asinex Ltd, and TimTec. The total size of the Meta-Database is 1 million entries. The compound records contain calculated three-dimensional coordinates and sometimes information about biological activity. Some molecules have information about FDA drug approving status or about anti-HIV activity. Meta-Database can be downloaded in SDF format and used for virtual high-throughput screening of new potential drugs. The database can also be screened using a Java-based tool.
ILThermo
http://ilthermo.boulder.nist.gov/ILThermo/
ILThermo is free web research tool that allows users to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.
The Catalytic Site Atlas (CSA) is a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure. We defined a classification of catalytic residues which includes only those residues thought to be directly involved in some aspect of the reaction catalysed by an enzyme. For a full description of the classification, see Reference 2. The CSA contains 2 types of entry:
Original hand-annotated entries, derived from the primary literature. References for these entries are given.
Homologous entries, found by PSI-BLAST alignment (using an e value cut-off of 0.00005) to one of the original entries. The equivalent residues, which align in sequence to the catalytic residues found in the original entry are documented.
Access to the CSA is via PDB code, SWISS-PROT entry or E.C. number. Accessing via PDB code takes you straight to the CSA entry for that PDB, while accessing via SWISS-PROT or E.C. number gives a list of all PDB codes for structures assigned that particular SWISS-PROT identifier or E.C. number. Structures with entries in the CSA are given as hyperlinks. Each CSA entry lists the catalytic residues found in that entry, using PDB residue numbering. Each site is also marked with an evidence tag, which is either "Literature reference" or "PSI-BLAST hit". If the entry is a PSI-BLAST hit you can follow the link to the original entry. The active site can be visualised using RasMol. Each entry contains a link to a list of homologous entries found by PSI-BLAST, and a link to other PDB structures with identical E.C. numbers or SWISS-PROT identifier to the entry you are viewing. References:
The Catalytic Site Atlas: a resource of catalytic sites and residues identified in enzymes using structural data. Craig T. Porter, Gail J. Bartlett, and Janet M. Thornton (2004) Nucl. Acids. Res. 32: D129-D133.
Analysis of Catalytic Residues in Enzyme Active Sites. Gail J. Bartlett, Craig T. Porter, Neera Borkakoti, and Janet M. Thornton (2002) J Mol Biol 324:105-121.
Using a Library of Structural Templates to Recognise Catalytic Sites and Explore their Evolution in Homologous Families. James W. Torrance, Gail J. Bartlett, Craig T. Porter, Janet M. Thornton (2005) J Mol Biol. 347:565-81
Wiki
Zinc
PubChem
DUD
CDD
e-Molecules
PDB
Drug Bank
Chem Spider
PubMed
RISM-MOL
PAN Pesticide Database
MONARPOP
ChemDB
ChemNavigator
Ligand.Info
ILThermo
pKa in non-aqueous media
Open Notebook Science Challenge
NIST
Fundamental Physical Constants
Online Databases
Molport
Catalytic Site Atlas database
Wiki
http://www.wikipedia.org/Example of INFORMATION:
acetylsalicylate
acetylsalicylic acid
O-acetylsalicylic acid
1 g dose: 5 h
2 g dose: 9 h
Zinc
http://zinc.docking.org/SEARCH: One can compose a query by specifying molecular property (Net charge, xLogP, Rotatable bonds, H-donors, Polar surface area, Molecular weight, etc.) or molecule constitution (SMILES/SMARTS). One may also specify ZINC IDs, and original catalog numbers.
INFORMATION:
PubChem
http://pubchem.ncbi.nlm.nih.gov/SEARCH: 1) simple (chemical name); 2) advanced (Chemical Properties, Stereochemistry, BioAssays, Links, Elements)
INFORMATION:
Dud
http://dud.docking.org/http://pubs.acs.org/doi/abs/10.1021/jm0608356DUD, a directory of useful decoys for benchmarking virtual screening. DUD is designed to help test docking algorithms by providing challenging decoys. It contains:
Every ligand has 36 decoy molecules that are physically similar but topologically distinct, leading to a database of 98
CDD
http://www.collaborativedrug.com/Collaborative Drug Discovery's web-based software organizes preclinical research data to help scientists advance new drug candidates more effectively.
e-Molecules
http://www.emolecules.com/INFORMATION:
PDB
http://www.rcsb.org/pdb/home/home.doThe PDB archive contains information about experimentally-determined structures of proteins, nucleic acids, and complex assemblies. As a member of the wwPDB, the RCSB PDB curates and annotates PDB data according to agreed upon standards. The RCSB PDB also provides a variety of tools and resources. Users can perform simple and advanced searches based on annotations relating to sequence, structure and function. These molecules are visualized, downloaded, and analyzed by users who range from students to specialized scientists.
Drug Bank
http://www.drugbank.ca/The DrugBank database is a unique bioinformatics and cheminformatics resource that combines detailed drug (i.e. chemical, pharmacological and pharmaceutical) data with comprehensive drug target (i.e. sequence, structure, and pathway) information. The database contains nearly 4800 drug entries including >1,350 FDA-approved small molecule drugs, 123 FDA-approved biotech (protein/peptide) drugs, 71 nutraceuticals and >3,243 experimental drugs.
SEARCH: 1) simple, 2) advanced
INFORMATION:
Chem Spider
http://www.chemspider.com/ChemSpider links together compound information across the web, providing free text and structure search access of millions of chemical structures. With an abundance of additional property information, tools to upload, curate and use the data, and integration to a multitude of other online services, ChemSpider is the richest single source of structure-based chemistry information. It is provided to the community by the Royal Society of Chemistry.
SEARCH: 1) simple 2) advanced
INFORMATION:
PubMed
http://www.ncbi.nlm.nih.gov/pubmedPubMed comprises more than 19 million citations for biomedical literature from MEDLINE, life science journals, and online books. Citations may include links to full-text content from PubMed Central and publisher web sites.
RISM-MOL
*1. To add a key to key files for selection. Example:*2. The script:
Usage: ./script MolIndex
- takes as command line input index of molecule (in the order they are present *.m file) to start the calculations from
- gets molecule names from /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m file
- runs a loop over molecules in the list starting with those with specified index.
- runs the MATLAB, that executes all the commands in RunSet_TestSet.m file (with mol index as input).
- waits until the MATLAB process disappears from the list of running processors or the time of the MATLAB execution is more that 5 min. If process is still in the list after 5 min => it is killed and warning is written.
#!/bin/bash
echo "Usage script StartInd"
StartInd=$1
list=`cat /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m`
ind=0
for val in $list ; do
ind=$(($ind+1))
if [ $ind -ge $(($StartInd+1)) ]; then
mol=${val:1:$(( ${#val}-6 ))}
MolInd=$(($ind-1))
echo "Running mol: " $MolInd " " $mol
/opt/local/bin/matlab -r "RunSet_TestSet($MolInd) " -nojvm -nosplash & pid=$!
echo $pid > tmp_pid
i='0'
p='1'
while [ $i -le 299 -a $p -gt 0 ] ; do
p=`ps -u frolov | grep $pid | wc -l`
sleep 5
i=$(($i+5))
echo $i " "$p
done
if [ $p -gt 0 ] ; then
echo "WARNING!!! KILLING pid: "$pid
kill $pid
fi
fi
done
The contents of the "/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m":
TestSet={...
'1,1,1,2-tetrachloroethane',...
'1,1,1-trichloroethane',...
'1,1,2,2-tetrachloroethane',...
...
'trichloromethane',...
'undecan-2-one',...
'tmp'};
Note: have to put the 'tmp' name at the end to keep the ",..." for the previous molecule. After calculating all molecules the SCRIPT will try to calculate the "t" ("tmp") molecule and will fail. This is OK.
The contents of "RunSet_TestSet.m":
- sets the path to database functions
- runs the script with cell array of molecule names
- sets the DataBasePath
- runs a loop for one entry: gets the RISM input, runs RISM input with "do_something" and "StartRISMscript".
function RunSet_any(ind)
path(path,'/net/v215-2/data4/fedorov-group/Database/bin/');
run '/net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script_full.m';
Set=TestSet;
%run /net/maxwell/people/frolov/distr/RISM_MOL/LastReleaseRismMol/Systems/TestSet_script.m
%Set=TestSet;
N=length(Set);
DataSetsPath = '/net/v215-2/data4/fedorov-group/Database/DataSets';
P = getDataSets(DataSetsPath,'Frolov_TestSet','System');
ou = struct('Distance','Angstr','Energy','kcal_mol');
for i=ind:ind
i
name=TestSet{i};
[FL,KL,KFL,PL]=multiSelect(P,'Name',name);
FL;
do_something(StartRISMScript(DataSetsPath,ou,'user_Closure=''HNC''; user_MixingRules=''LorentzBerthelot''; user_LambdaCoupling=0.5;'),FL,KL,KFL,PL);
end
quit
end
Hope this helps!
PAN Pesticide Database
http://www.pesticideinfo.org/MONARPOP
http://www.monarpop.at/ChemBD
http://cdb.ics.uci.edu/Predicts 3D Structure from SMILES
Generates 2D Images from SMILES
Molecule File Format Converter
Calculate / Predict Molecular Properties
Product Library Generation
Counts Functional Groups (sub-structures)
Screens Molecules by Functional Group Count
Fragments Molecules for Mass Spec Analysis
Searches ChemDB by Monoisotopic Mass and Substructure Filtering
ChemNavigator
http://www.chemnavigator.com/Ligand.Info
http://ligand.info/Ligand.Info is a compilation of various publicly available databases of small molecules such as ChemBank, ChemPDB, KEGG, NCI, AKos GmbH, Asinex Ltd, and TimTec. The total size of the Meta-Database is 1 million entries. The compound records contain calculated three-dimensional coordinates and sometimes information about biological activity. Some molecules have information about FDA drug approving status or about anti-HIV activity. Meta-Database can be downloaded in SDF format and used for virtual high-throughput screening of new potential drugs. The database can also be screened using a Java-based tool.
ILThermo
http://ilthermo.boulder.nist.gov/ILThermo/ILThermo is free web research tool that allows users to access an up-to-date data collection from the publications on experimental investigations of thermodynamic, and transport properties of ionic liquids as well as binary and ternary mixtures containing ionic liquids.
pKa in non-aqueous media
http://tera.chem.ut.ee/~ivo/HA_UT/
Kind of database. Contains set of links to sources with pKa values in non-aqueous media
Open Notebook Science Challenge
http://onschallenge.wikispaces.com/Phys-Chem properties
NIST
http://www.nist.gov/srd/
http://physics.nist.gov/cuu/Constants/index.htmlFundamental Physical Constants
http://www.nist.gov/srd/onlinelist.cfmOnline Databases
MolPort
http://www.molport.com/buy-chemicals/moleculelink/N-2-hydroxyphenyl-acetamide/900799Catalytic Site Atlas database
http://www.ebi.ac.uk/thornton-srv/databases/CSA/Introduction
The Catalytic Site Atlas (CSA) is a database documenting enzyme active sites and catalytic residues in enzymes of 3D structure. We defined a classification of catalytic residues which includes only those residues thought to be directly involved in some aspect of the reaction catalysed by an enzyme. For a full description of the classification, see Reference 2.The CSA contains 2 types of entry:
- Original hand-annotated entries, derived from the primary literature. References for these entries are given.
- Homologous entries, found by PSI-BLAST alignment (using an e value cut-off of 0.00005) to one of the original entries. The equivalent residues, which align in sequence to the catalytic residues found in the original entry are documented.
Access to the CSA is via PDB code, SWISS-PROT entry or E.C. number. Accessing via PDB code takes you straight to the CSA entry for that PDB, while accessing via SWISS-PROT or E.C. number gives a list of all PDB codes for structures assigned that particular SWISS-PROT identifier or E.C. number. Structures with entries in the CSA are given as hyperlinks.Each CSA entry lists the catalytic residues found in that entry, using PDB residue numbering. Each site is also marked with an evidence tag, which is either "Literature reference" or "PSI-BLAST hit". If the entry is a PSI-BLAST hit you can follow the link to the original entry. The active site can be visualised using RasMol.
Each entry contains a link to a list of homologous entries found by PSI-BLAST, and a link to other PDB structures with identical E.C. numbers or SWISS-PROT identifier to the entry you are viewing.
References: